GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet by wjones127 · Pull Request #15101 · apache/arrow

wjones127 · 2022-12-27T23:54:41Z

Closes: Add benchmarks for reading and writing strings #15100

github-actions · 2022-12-27T23:55:08Z

Closes: Add benchmarks for reading and writing strings #15100

wjones127 · 2022-12-28T15:12:12Z

@ursabot please benchmark command=cpp-micro --suite-filter=parquet-arrow-reader-writer-benchmark

wjones127 · 2022-12-29T21:37:33Z

@ursabot please benchmark command=cpp-micro --suite-filter=parquet-arrow-reader-writer-benchmark

wjones127 · 2022-12-30T02:37:24Z

@ursabot please benchmark

ursabot · 2022-12-30T02:37:29Z

Benchmark runs are scheduled for baseline = 6236dba and contender = 3c02495. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️2.04% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️1.81% ⬆️0.14%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3c02495f ec2-t3-xlarge-us-east-2
[Failed] 3c02495f test-mac-arm
[Finished] 3c02495f ursa-i9-9960x
[Finished] 3c02495f ursa-thinkcentre-m75q
[Finished] 6236dbac ec2-t3-xlarge-us-east-2
[Failed] 6236dbac test-mac-arm
[Finished] 6236dbac ursa-i9-9960x
[Finished] 6236dbac ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2022-12-30T06:42:01Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

pitrou · 2023-01-03T10:58:38Z

+      ::arrow::schema({::arrow::field("column", type, null_percentage > 0)}), {arr});
+}
+
+static void BM_WriteBinaryColumn(::benchmark::State& state) {


Does it use the PLAIN encoding? Add a comment?

I added a comment near the parameters of each benchmark, explaining we are using the unique_values to trigger the code paths for dictionary and plain encodings. I tried to add a test within the benchmark to validate we are getting the expected encodings. But I found that it was too complicated, as the encodings can change from page to page and also apply to the definition and repetition levels (IIUC).

I see. Can you just confirm that the expected encodings are used (and add a comment)?

Just saw the comment below, sorry. Please disregard. :-)

… Parquet (apache#15101) * Closes: apache#15100 Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

ursabot · 2023-01-05T21:12:45Z

Benchmark runs are scheduled for baseline = 25b5093 and contender = 040310f. 040310f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️8.15% ⬆️6.76%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.47% ⬆️0.17%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 040310fe ec2-t3-xlarge-us-east-2
[Failed] 040310fe test-mac-arm
[Finished] 040310fe ursa-i9-9960x
[Finished] 040310fe ursa-thinkcentre-m75q
[Finished] 25b50932 ec2-t3-xlarge-us-east-2
[Failed] 25b50932 test-mac-arm
[Finished] 25b50932 ursa-i9-9960x
[Finished] 25b50932 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2023-01-05T21:13:02Z

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm
ursa-i9-9960x

ursabot · 2023-01-06T02:52:36Z

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm

feat(bench): add benchmark for reading strings from Parquet.

5df5ede

wjones127 changed the title ~~GH-15100: [C++][Parquet] Add benchmark for reading strings from Parque~~ GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet Dec 27, 2022

feat(bench): add write benchmark

07ec943

github-actions Bot added Component: C++ Component: Parquet labels Dec 28, 2022

cleanup

3c02495

wjones127 marked this pull request as ready for review December 28, 2022 16:10

pitrou reviewed Jan 3, 2023

View reviewed changes

Comment thread cpp/src/parquet/arrow/reader_writer_benchmark.cc Outdated

pr feedback

4a05c5a

wjones127 requested a review from pitrou January 4, 2023 21:13

pitrou approved these changes Jan 5, 2023

View reviewed changes

pitrou merged commit 040310f into apache:master Jan 5, 2023

Uh oh!

Conversation

wjones127 commented Dec 27, 2022 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Dec 27, 2022

Uh oh!

wjones127 commented Dec 28, 2022

Uh oh!

wjones127 commented Dec 29, 2022

Uh oh!

wjones127 commented Dec 30, 2022

Uh oh!

ursabot commented Dec 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ursabot commented Dec 30, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou Jan 3, 2023

Choose a reason for hiding this comment

Uh oh!

wjones127 Jan 3, 2023

Choose a reason for hiding this comment

Uh oh!

pitrou Jan 4, 2023

Choose a reason for hiding this comment

Uh oh!

pitrou Jan 4, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ursabot commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ursabot commented Jan 5, 2023

Uh oh!

ursabot commented Jan 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wjones127 commented Dec 27, 2022 •

edited by github-actions Bot

Loading

ursabot commented Dec 30, 2022 •

edited

Loading

ursabot commented Jan 5, 2023 •

edited

Loading